System and method for determining quality metrics for a question set

ABSTRACT

A computer-implemented method, computer program product, and system are provided for determining quality metrics for a question set. In an implementation, a test question set model may be produced based upon calculated quality metrics of a test question set with respect to a test corpus, and including features representing quality metrics. The test question set model may be compared to a baseline question set model based on a distance calculated between one or more projected model features of the baseline question set model and one or more runtime model features of the test question set model. Contents of the test question set may be adjusted based upon the calculated distance.

TECHNICAL FIELD

The present disclosure generally relates to coverage of question sets ona corpus, and more particularly relates to systems and methods fordetermining quality metrics for a question set.

BACKGROUND

Question answering systems may rely heavily on adequate question sets totest the question answering systems. Question sets may also be usedtrain the question answering system for better results. Question setsmay accurately test and cover a particular domain with a broad range ofdiverse questions.

SUMMARY

According to an implementation, a computer-implemented method mayinclude producing, by a processor, a test question set model based upon,at least in part, calculated quality metrics of a test question set withrespect to a test corpus, and including a plurality of test question setmodel features representing quality metrics for the test question set.The method may also include comparing, by the processor, the testquestion set model to a baseline question set model based on calculatinga distance between one or more projected model features of the baselinequestion set model and one or more runtime model features of the testquestion set model. The method may also include adjusting, by theprocessor, contents of the test question set based upon, at least inpart, the calculated distance between the projected model features ofthe baseline question set model and the runtime model features of thetest question set model.

One or more of the following features may be included. The baselinequestion set model may be produced based on calculated quality metricsof a baseline question set with respect to a baseline corpus andincludes a plurality of baseline question set model featuresrepresenting quality metrics for the baseline question set. Machinelearning may be applied to tune the test question set model by rewardingprominent features of the test question set and penalizing lessprominent features of the test question set. The baseline question setmodel may be selected based upon, at least in part, a domain distancebetween the baseline corpus and the test corpus. The calculated qualitymetrics for the test question set model may be calculated using a staticquestion set analysis tool.

The method may further include projecting the test question set accuracyfrom the runtime model features of the baseline question set byanalyzing the distance between the baseline question set model and thetest question set model. The method may further include identifying alevel of coverage for the test question set. The method may furtherinclude identifying a level of non-coverage for the test question set.

According to another implementation, a computer program product mayinclude a non-transitory computer readable medium having a plurality ofinstructions stored on it. When executed by a processor, theinstructions may cause the processor to perform operations includingproducing a test question set model based upon, at least in part,calculated quality metrics of a test question set with respect to a testcorpus, and including a plurality of test question set model featuresrepresenting quality metrics for the test question set. Instructions mayalso be included for comparing the test question set model to a baselinequestion set model based on calculating a distance between one or moreprojected model features of the baseline question set model and one ormore runtime model features of the test question set model. Instructionsmay also be included for adjusting contents of the test question setbased upon, at least in part, the calculated distance between theprojected model features of the baseline question set model and theruntime model features of the test question set model.

According to yet another implementation, a system may include at leastone processor device and at least one memory architecture coupled withthe at least one processor device. The at least one processor device maybe configured for producing a test question set model based upon, atleast in part, calculated quality metrics of a test question set withrespect to a test corpus, and including a plurality of test question setmodel features representing quality metrics for the test question set.The at least one processor device may also be configured for comparingthe test question set model to a baseline question set model based oncalculating a distance between one or more projected model features ofthe baseline question set model and one or more runtime model featuresof the test question set model. The at least one processor device mayalso be configured for adjusting contents of the test question set basedupon, at least in part, the calculated distance between the projectedmodel features of the baseline question set model and the runtime modelfeatures of the test question set model.

The details of one or more implementations are set forth in theaccompanying drawings and the description below. Other features andadvantages will become apparent from the description, the drawings, andthe claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagrammatic view of a distributed computing networkincluding a computing device that executes a metric quality processaccording to an implementation of the present disclosure;

FIG. 2 is a diagrammatic view of a question set, a corpus, and answersaccording to an implementation of the present disclosure;

FIG. 3 is a diagrammatic view of a baseline question set and a testquestion set for a shared baseline corpus according to an implementationof the present disclosure;

FIG. 4 is a diagrammatic view of a baseline question set and a testquestion set for a separate baseline corpus and test corpus according toan implementation of the present disclosure;

FIG. 5 is a flowchart of the metric quality process of FIG. 1, accordingto an implementation of the present disclosure; and

FIG. 6 is a diagrammatic view of the computing device of FIG. 1,according to an implementation of the present disclosure.

Like reference symbols in the various drawings indicate like elements.

DETAILED DESCRIPTION

Referring to FIG. 1, there is shown metric quality process 10. For thefollowing discussion, it is intended to be understood that metricquality process 10 may be implemented in a variety of ways. For example,metric quality process 10 may be implemented as a server-side process, aclient-side process, or a server-side/client-side process.

For example, metric quality process 10 may be implemented as a purelyserver-side process via metric quality process 10 s. Alternatively,metric quality process 10 may be implemented as a purely client-sideprocess via one or more of client-side application 10 c 1, client-sideapplication 10 c 2, client-side application 10 c 3, and client-sideapplication 10 c 4. Alternatively still, metric quality process 10 maybe implemented as a server-side/client-side process via metric qualityprocess 10 s in combination with one or more of client-side application10 c 1, client-side application 10 c 2, client-side application 10 c 3,and client-side application 10 c 4. In such an example, at least aportion of the functionality of metric quality process 10 may beperformed by metric quality process 10 s and at least a portion of thefunctionality of metric quality process 10 may be performed by one ormore of client-side application 10 c 1, 10 c 2, 10 c 3, and 10 c 3.

Accordingly, metric quality process 10 as used in this disclosure mayinclude any combination of metric quality process 10 s, client-sideapplication 10 c 1, client-side application 10 c 2, client-sideapplication 10 c 3, and client-side application 10 c 4.

FIG. 2 depicts an example of a question set 60 that can include aplurality of questions 62 (e.g., Q1 to Qn) in a question answeringcomputer system. A corpus 64 can include various documents 66, fragments68 (e.g., web pages), and/or passages 70 related to a domain D fromwhich answers are desired. The domain D may be associated with aparticular field of interest, such as medical information, insurancecoding, and the like. Portions of the corpus 64 may be tagged withidentifiers 72 used to construct candidate answers 74. An answerselection process that is known in the art can determine a selectedanswer 76 from the candidate answers 74. The question set 60 may includethousands of questions (e.g., tens of thousands, hundreds of thousands,etc.). In embodiments, the metric quality process 10 can be used todetermine whether the questions 62 within the question set 60 provide asufficient level of coverage of the corpus 64. For example, is someapplications, the corpus 64 must contain 100% coverage for all questions62 contained in the question set 60. In other applications, the level ofcoverage is deemed sufficient if 70% of the questions 62 contained inthe question set 60 can be answered correctly using the corpus 64.

FIG. 3 depicts an example of a baseline question set 78 and a testquestion set 80 that are analyzed with respect to a same baseline corpus82. A static question set analysis tool 84 can be used to determine aplurality of metrics of the baseline question set 78 with respect to thebaseline corpus 82 as a baseline question set model 86. The baselinecorpus 82 may also be a test corpus with respect to the test questionset 80. The static question set analysis tool 84 can be used todetermine a plurality of metrics of the test question set 80 withrespect to the test corpus (i.e., baseline corpus 82 in this example) asa test question set model 88. The contents of the test question set 80differ from the baseline question set 78. The test question set model 88can be compared to the baseline question set model 86 to determine howsimilar the performance of test question set 80 is to the baselinequestion set 78. For example, coverage of the test question set 80 maybe deemed higher or lower than coverage provided by the baselinequestion set 78. A question bank 90 can be accessed to add, remove, orupdate questions from the test question set 80, and a new iteration ofthe test question set model 88 can be run for further comparison againstthe baseline question set model 86.

In the example of FIG. 4, the test question set 80 is analyzed withrespect to test corpus 92 to determine test question set model 88. Thetest corpus 92 is a different corpus and may be from a different domainthan the baseline corpus 82. Even though the domains may be differentfor baseline corpus 82 and test corpus 92, the metric quality process 10enables tuning of the test question set 80 by adding, removing and/orupdating question content, for example, to align metrics of the testquestion set model 88 with the baseline question set model 86.

Referring also to FIG. 5 with continued reference to FIGS. 1-4, and aswill be discussed in greater detail below, metric quality process 10 mayproduce 100 a baseline question set model 86 based on calculated qualitymetrics of a baseline question set 78 with respect to a baseline corpus82 and includes a plurality of baseline question set model featuresrepresenting quality metrics for the baseline question set 78. Metricquality process 10 may also produce 102 a test question set model 88based upon, at least in part, calculated metrics of a test question set80 with respect to a test corpus 92 (which in some embodiments isequivalent to baseline corpus 82) and includes a plurality of testquestion set model features representing quality metrics for the testquestion set 80. Metric quality process 10 may further compare 106 thetest question set model 88 to the baseline question set model 86 basedon calculating a distance between one or more projected model featuresof the baseline question set model 86 and one or more runtime modelfeatures of the test question set model 88. Metric quality process 10may also adjust 108 contents (e.g., questions) of the test question set80 based upon, at least in part, the calculated distance between theprojected model features of the baseline question set model 86 and theruntime model features of the test question set model 88.

Metric quality process 10 s may be a server application and may resideon and may be executed by computing device 12, which may be connected tonetwork 14 (e.g., the Internet or a local area network). Examples ofcomputing device 12 may include, but are not limited to: a personalcomputer, a server computer, a series of server computers, a minicomputer, a mainframe computer, or a dedicated network device.

The instruction sets and subroutines of metric quality process 10 s,which may be stored on storage device 16 coupled to computing device 12,may be executed by one or more processors (not shown) and one or morememory architectures (not shown) included within computing device 12.Examples of storage device 16 may include but are not limited to: a harddisk drive; a tape drive; an optical drive; a RAID device; an NASdevice, a Storage Area Network, a random access memory (RAM); aread-only memory (ROM); and all forms of flash memory storage devices.

Network 14 may be connected to one or more secondary networks (e.g.,network 18), examples of which may include but are not limited to: alocal area network; a wide area network; or an intranet, for example.

Examples of client-side applications 10 c 1, 10 c 2, 10 c 3, 10 c 4 mayinclude but are not limited to a web browser, or a specializedapplication (e.g., an application running on a mobile platform). Theinstruction sets and subroutines of client-side application 10 c 1, 10 c2, 10 c 3, 10 c 4, which may be stored on storage devices 20, 22, 24, 26(respectively) coupled to client electronic devices 28, 30, 32, 34(respectively), may be executed by one or more processors (not shown)and one or more memory architectures (not shown) incorporated intoclient electronic devices 28, 30, 32, 34 (respectively). Examples ofstorage devices 20, 22, 24, 26 may include but are not limited to: harddisk drives; tape drives; optical drives; RAID devices; random accessmemories (RAM); read-only memories (ROM), and all forms of flash memorystorage devices.

Examples of client electronic devices 28, 30, 32, 34 may include, butare not limited to, personal computer 28, laptop computer 30, mobilecomputing device 32, notebook computer 34, a netbook computer (notshown), a server computer (not shown), a gaming console (not shown), adata-enabled television console (not shown), and a dedicated networkdevice (not shown). Client electronic devices 28, 30, 32, 34 may eachexecute an operating system.

Users 36, 38, 40, 42 may access metric quality process 10 directlythrough network 14 or through secondary network 18. Further, metricquality process 10 may be accessed through secondary network 18 via linkline 44.

The various client electronic devices (e.g., client electronic devices28, 30, 32, 34) may be directly or indirectly coupled to network 14 (ornetwork 18). For example, personal computer 28 is shown directly coupledto network 14. Further, laptop computer 30 is shown wirelessly coupledto network 14 via wireless communication channels 44 established betweenlaptop computer 30 and wireless access point (WAP) 48. Similarly, mobilecomputing device 32 is shown wirelessly coupled to network 14 viawireless communication channel 46 established between mobile computingdevice 32 and cellular network/bridge 50, which is shown directlycoupled to network 14. WAP 48 may be, for example, an IEEE 802.11a,802.11b, 802.11g, 802.11n, Wi-Fi, and/or Bluetooth device that iscapable of establishing wireless communication channel 44 between laptopcomputer 30 and WAP 48. Additionally, personal computer 34 is showndirectly coupled to network 18 via a hardwired network connection.

As generally discussed above with reference to FIG. 5, metric qualityprocess 10 may define 100 a baseline question set model based upon, atleast in part, a baseline question set for a baseline corpus andincluding a plurality of baseline question set model featuresrepresenting quality metrics for the baseline question set. Metricquality process 10 may also produce 102 a test question set model basedupon, at least in part, calculated metrics for a test question set and atest corpus, and including a plurality of test question set modelfeatures representing quality metrics for the test question set. Metricquality process 10 may also use 104 the test question set model as atest dataset in applying the baseline question set model to the testcorpus. Metric quality process 10 may further calculate 106 a distancebetween one or more projected model features of the baseline questionset model and one or more runtime model features of the test questionset model. Metric quality process 10 may also adjust 108 the testquestion set based upon, at least in part, the calculated distancebetween the projected model features of the baseline question set modeland the runtime model features of the test question set model. Metricquality process 10 may further apply 110 machine learning to tune thetest question set based upon, at least in part, the baseline questionset model and the test question set model.

A corpus may generally be defined as a collection of written texts. Moreparticularly, a corpus may include a systematic collection of naturallyoccurring texts, including both written and spoken language. Thestructure and contents of the corpus may be restricted to particulartext types, to one or more varieties of English (or other languages),and/or to certain time periods. Any number of corpora may exist basedupon, at least in part, the structure and contents of the particularcorpus. Within a question answering system, a question set may be usedto produce answers based upon, at least in part, text from a corpusassociated with the question answering system. A question set to be usedto test a particular corpus associated with the question answeringsystem may be produced in a variety of ways. For example, a question setmay be consumer or user provided. A question set may be generated by anautomation tool from a template based upon, at least in part, thecorpus. A question set may also be manually input by corpus domainexperts. This is not meant to be a limitation of this disclosure, as thequestion set may be produced and/or provided in a variety of manners.Because the question set may be used to extract answers from the corpus,the question set may be robust in nature and desirably may properlycover each aspect of the particular corpus.

The baseline corpus 82 may include a single corpus, and/or may includemore than one corpora. As described above, a baseline question set 78 ofthe baseline question set model 86 may be provided and/or produced for abaseline corpus 82. As the baseline question set 78 is used to test thebaseline corpus 82, baseline accuracy and competency level metrics maybe produced for the baseline question set 78, as example of feature ofthe baseline question set model 86 that may represent quality metricsfor the baseline question set 78. For example, for each questionincluded within the baseline question set 78, an amount of contentincluded within the baseline corpus 82 “touched” by the question may beidentified. That is, any content within the baseline corpus 82 that maybe related to the question and/or may at least partially answer thequestion may be identified. Any content within the baseline corpus 82that may be related to the question and/or may at least partially answerthe question may be considered a possible candidate answer. Candidateanswers may be associated with a unique identifier (e.g., usingidentifiers 72) of that portion of the baseline corpus 82. In thisregard, when texts (e.g., written, spoken, etc.) are incorporated into acorpus, each text, as well as subparts of the text (e.g., which mayinclude any subpart such as a chapter, section, page, paragraph,sentence, phrase, etc., may be assigned a unique identifier). Theaccuracy and competency of the candidate answers in relation to eachquestion (i.e., how well the candidate answer answers the question) maybe used to produce baseline accuracy and competency level metrics (e.g.,which may singly and/or collectively represent quality metrics)associated with the baseline question set 78. For example, in anembodiment, a question and answer system may be used to estimate theaccuracy of a given question set by evaluating the performance of thesystem against a ground truth, which may contain answers to thequestions in the question set. Metric quality process 10 may use thebaseline accuracy and competency level metrics to produce or define thebaseline question set model 86. The baseline question set model 86 maybe defined with a set of features. The set of features may includemetrics, such as the baseline accuracy and competency level metrics,that may be produced in the context of a given corpus (e.g., thebaseline corpus 82). The baseline question set model 86 may include thebaseline accuracy and competency level metrics, e.g., as baselinequestion set model features representing quality metrics for thebaseline question set 78.

The test corpus 92 may include the same corpus as the baseline corpus82, may include a modification of the baseline corpus 82, or may includea different corpus (e.g., which may be related and/or unrelated to thebaseline corpus 82). As with the baseline corpus 82, the test corpus 92may include a single corpus and/or may include more than one corpora.The calculated quality metrics may include, for example, corpus coveragemetrics, corpus non-coverage metrics, a weak coverage rate, an accuracyrate of the test question set 80, and a recall rate of the test questionset 80, and breadth and depth metrics (e.g., the size of the vocabularyand grammar of the corpus). This list of calculated quality metrics isnot meant to be a limitation of this disclosure, as other possiblecalculated quality metrics may be included. Singly, or collectively, thecalculated quality metrics may include test question set model featuresrepresenting quality metrics for the test question set 80.

Each metric may be calculated using the static question set analysistool 84. The static question set analysis tool 84 may calculate qualitymetrics for the test question set 80 and the test corpus 92 to producethe test question set model 88. For example, for each question includedwithin the test question set 80, possible candidate answers withassociated unique identifiers may be selected from the test corpus 92.Metric quality process 10 may also identify how much of the test corpus92 has multiple cross-coverage questions, including identifying how muchof the test corpus 92 is “touched” by the test question set 80. In thismanner, metric quality process 10 may identify how much of the testcorpus 92 is “covered” and how much of the test corpus 92 is “notcovered” by the given question set. In some implementations, the degreeof coverage, degree of non-coverage, and degree of multiplecross-coverage may be represented as a heat map, e.g., which maygenerally indicate the number of questions covering each portion of thetest corpus 92.

As generally discussed above, metric quality process 10 may identify alevel of coverage for the test question set 80. Using the staticquestion set analysis tool 84, metric quality process 10 may calculate acoverage rate of the given question set by dividing a total number ofunique identifiers selected from a corpus for the given question set bythe total number of unique identifiers in the corpus (i.e., the totalnumber of possible candidate answers included within the corpus). Inthis manner, metric quality process 10 may identify the level ofcoverage for the given question set. As generally discussed above, theunique identifiers (e.g., identifiers 72) may include identifiersassociated with each text and/or subpart (e.g., chapter, section, page,paragraph, sentence, phrase, etc.) of each text included within the testcorpus 92. The unique identifiers may be associated with each textand/or subpart of each text when the text is initially incorporated intothe corpus, and/or at another time.

Metric quality process 10 may further identify a level of non-coveragefor the test question set 80. Using the static question set analysistool 84, metric quality process 10 may further calculate a non-coveragerate of the test question set 80 by dividing a number of uniqueidentifiers remaining that were not selected from the test corpus 92 forthe test question set 80 by the total number of unique identifiers inthe test corpus 92. In this manner, metric quality process 10 mayidentify the level of non-coverage for the test question set 80.

Further, metric quality process 10 may calculate a weak coverage ratefor the test question set 80 by calculating a percentage of uniqueidentifiers that may be mapped to failed questions (i.e., questions ofthe test question set 80 that failed to “touch” an answer from the testcorpus 92).

Metric quality process 10 may also calculate an accuracy rate of thetest question set 80 by dividing a number of correct answers by a numberof all possible answers from the test corpus 92. Metric quality process10 may further calculate a recall rate of the test question set 80 bydividing a number of correct answers by a number of all correct answersfrom the test corpus 92.

The baseline corpus 82 may be selected based upon, at least in part, anoptimal domain distance calculated between the baseline corpus 82 andthe test corpus 92. For example, in an embodiment, the domain distancemay be calculated by establishing the domain hierarchy that theapplication would need to support. For example, in the insurance domain,the car insurances from various insurance agencies may have a relativelysmall domain distance. As such, the baseline question set 78 may beselected from closest corpus in the domain hierarchy. In an embodiment,a graph may be created that may allow the number of nodes to be countedbetween the different domains. The less distant the baseline corpus 82is from the test corpus 92, the more accurate the metric comparisonbetween the baseline question model 86 and the test question set model88. The domain distance may be included within the test question setmodel 88.

In an example embodiment, the test question set model 88 may be builtfor a given test question set 80. The question set static metrics may becalculated and the test question set 80 may be run through a qualityassurance system to calculate the performance metrics for a testdataset. Static metrics and performance results produced may be comparedwith baseline metrics. If the distance is not acceptable, the testquestion set 80 may be modified and the next iteration (i.e., includingthe modified test question set) may be run through the quality assurancesystem. Where the baseline question model 86 and the test question setmodel 88 are implemented as vectors, a vector distance calculation canbe used to compare the baseline question model 86 and the test questionset model 88. The baseline question set metrics may be used to calculatethe difference metrics vector. The difference metrics vector values maybe used to evaluate if the test question set model 88 is qualified forpredicted performance for the test domain 92. Metric quality process 10may further calculate a distance between one or more projected modelfeatures of the baseline question set 78 and one or more runtime modelfeatures of the test question set model 88. Calculating the distancebetween the projected model features of the baseline question set model86 and the runtime model features of the test question set model 88 mayinclude analyzing the distance between the model features of thebaseline question set model 86 (i.e., which may represent qualitymetrics for the baseline question set 78) and the model features of thetest question set model 88 obtained at runtime (e.g., assessing theground truth answers to the test question set 80 relative to the testquestion set model features calculated using static analysis). Thebaseline question set model 86 may be used for comparison with the testquestion set model 88 to evaluate the calculated quality metrics above.This may be beneficial to produce qualifying training and testing of thegiven question set during new domain/corpus adaptation and to testchanges to question answering systems. The baseline accuracy andcompetency level metrics included within the baseline question set model86 may be compared to the calculated quality metrics of the testquestion set model 88. This may help predict the accuracy of the givenquestion set. For example, calculating the distance between theprojected accuracy and the runtime accuracy between the test questionset model 88 and the baseline question set model 86 may includecalculating how far the calculated quality metrics are from the baselineaccuracy and competency level metrics included within the baselinequestion set model 86 in reaching a baseline goal of 70% accuracy(and/or any other goal established as a desired accuracy level). Metricquality process 10 may further determine whether the calculated qualitymetrics meet specific application requirements. For example, 100% corpuscoverage of a question set may be a requirement for a “quiz type” ofquestion answering system. However, lesser corpus coverage may besuitable for other question answer systems.

Metric quality process 10 may adjust the contents of the test questionset 80 based upon, at least in part, the calculated distance between theprojected model features of the baseline question set model 86 and theruntime model features of the test question set model 88. For example,new questions may be added to the given question set. Further, questionsmay be removed from the given question set. Further, individualquestions within the test question set 80 may be modified. In thismanner, the test question set model 88 may be adjusted to moreaccurately test as much of the test corpus 92 as possible.

Metric quality process 10 may apply machine learning to tune the testquestion set 80 based upon the baseline question set model 82 and thetest question set model 88. For example, various known machine learningalgorithms may be applied to tuning the test question set model 88, suchas a logistic regression algorithm. The different question sets mayinclude questions that are different from the questions included withinthe baseline question set 78. By way of example, tuning the testquestion set model 88 may work to establish or identify a commonlanguage (e.g., a common model) for comparing different corpora. Thiscommon language may be generated by a training process that may includethe accuracy and competency metrics (e.g., the model features of thetest question set model 88 and the baseline question set model 86)and/or the difference between the accuracy and competency metrics asinput. Thus, the accuracy and competency level metrics may not beaffected by the model generation, but may affect the model generation.As described above, the different question sets may be produced and/orprovided in a variety of manners, including but not limited to, userprovided, automation tool generated, manually provided by domainexperts, etc. The different question sets may be used to extract answersfrom the test corpus 92. Different questions from different questionsets may extract different candidate answers from the test corpus 92.Different question sets may include different questions and/or mayinclude rephrased questions from the test question set 80. In a similarmanner as described above, for each question included within thedifferent question sets, an amount of content included within the testcorpus “touched” by the question in the different question sets may beidentified. That is, any content within the test corpus 92 that may berelated to the question and/or at least partially answer the questionmay be identified as possible candidate answers. As generally describedabove, possible candidate answers may be associated with a uniqueidentifier. Metric quality process 10 may apply machine learning fromthe possible candidate answers associated with the different questionsets for the test corpus 92 to train the test question set model 88.Applying 110 machine learning to tune the test question set model 88using different question sets for the test corpus 92 may includerewarding prominent features of the test question set 80 and penalizingless prominent features of the test question set 80. For example, in anembodiment machine learning models may, as part of their operation,decide on different weights (e.g., prominences) for the various featuresused as input, and, as such, may follow from the application of themachine learning algorithm to tune the test question set model 88.Features of the test question set model 88 may include the accuracy andcompetency level metrics. In this way, the test question set model 88may become more robust by introducing new questions from the differentquestion sets that “touch” or “cover” other areas of the test corpus 92and/or may introduce rephrased questions that may provide differentanswers from the test corpus 92. Further, the test question set accuracymay be projected from the runtime model features by analyzing thedistance between the baseline question set model 86 and the testquestion set model 88.

Referring also to FIG. 6, there is shown a diagrammatic view ofcomputing system 12. While computing system 12 is shown in this figure,this is for illustrative purposes only and is not intended to be alimitation of this disclosure, as other configuration are possible. Forexample, any computing device capable of executing, in whole or in part,metric quality process 10 may be substituted for computing device 12within FIG. 6, examples of which may include but are not limited toclient electronic devices 28, 30, 32, 34.

Computing system 12 may include microprocessor 200 configured to e.g.,process data and execute instructions/code for metric quality process10. Microprocessor 200 may be coupled to storage device 16. As discussedabove, examples of storage device 16 may include but are not limited to:a hard disk drive; a tape drive; an optical drive; a RAID device; an NASdevice, a Storage Area Network, a random access memory (RAM); aread-only memory (ROM); and all forms of flash memory storage devices.IO controller 202 may be configured to couple microprocessor 200 withvarious devices, such as keyboard 204, mouse 206, USB ports (not shown),and printer ports (not shown). Display adaptor 208 may be configured tocouple display 210 (e.g., a CRT or LCD monitor) with microprocessor 200,while network adapter 212 (e.g., an Ethernet adapter) may be configuredto couple microprocessor 200 to network 14 (e.g., the Internet or alocal area network).

The present invention may be a system, a method, and/or a computerprogram product. The computer program product may include a computerreadable storage medium (or media) having computer readable programinstructions thereon for causing a processor to carry out aspects of thepresent invention.

The computer readable storage medium can be a tangible device that canretain and store instructions for use by an instruction executiondevice. The computer readable storage medium may be, for example, but isnot limited to, an electronic storage device, a magnetic storage device,an optical storage device, an electromagnetic storage device, asemiconductor storage device, or any suitable combination of theforegoing. A non-exhaustive list of more specific examples of thecomputer readable storage medium includes the following: a portablecomputer diskette, a hard disk, a random access memory (RAM), aread-only memory (ROM), an erasable programmable read-only memory (EPROMor Flash memory), a static random access memory (SRAM), a portablecompact disc read-only memory (CD-ROM), a digital versatile disk (DVD),a memory stick, a floppy disk, a mechanically encoded device such aspunch-cards or raised structures in a groove having instructionsrecorded thereon, and any suitable combination of the foregoing. Acomputer readable storage medium, as used herein, is not to be construedas being transitory signals per se, such as radio waves or other freelypropagating electromagnetic waves, electromagnetic waves propagatingthrough a waveguide or other transmission media (e.g., light pulsespassing through a fiber-optic cable), or electrical signals transmittedthrough a wire.

Computer readable program instructions described herein can bedownloaded to respective computing/processing devices from a computerreadable storage medium or to an external computer or external storagedevice via a network, for example, the Internet, a local area network, awide area network and/or a wireless network. The network may comprisecopper transmission cables, optical transmission fibers, wirelesstransmission, routers, firewalls, switches, gateway computers and/oredge servers. A network adapter card or network interface in eachcomputing/processing device receives computer readable programinstructions from the network and forwards the computer readable programinstructions for storage in a computer readable storage medium withinthe respective computing/processing device.

Computer readable program instructions for carrying out operations ofthe present invention may be assembler instructions,instruction-set-architecture (ISA) instructions, machine instructions,machine dependent instructions, microcode, firmware instructions,state-setting data, or either source code or object code written in anycombination of one or more programming languages, including an objectoriented programming language such as Smalltalk, C++ or the like, andconventional procedural programming languages, such as the “C”programming language or similar programming languages. The computerreadable program instructions may execute entirely on the user'scomputer, partly on the user's computer, as a stand-alone softwarepackage, partly on the user's computer and partly on a remote computeror entirely on the remote computer or server. In the latter scenario,the remote computer may be connected to the user's computer through anytype of network, including a local area network (LAN) or a wide areanetwork (WAN), or the connection may be made to an external computer(for example, through the Internet using an Internet Service Provider).In some embodiments, electronic circuitry including, for example,programmable logic circuitry, field-programmable gate arrays (FPGA), orprogrammable logic arrays (PLA) may execute the computer readableprogram instructions by utilizing state information of the computerreadable program instructions to personalize the electronic circuitry,in order to perform aspects of the present invention.

Aspects of the present invention are described herein with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems), and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer readable program instructions.

These computer readable program instructions may be provided to aprocessor of a general purpose computer, special purpose computer, orother programmable data processing apparatus to produce a machine, suchthat the instructions, which execute via the processor of the computeror other programmable data processing apparatus, create means forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks. These computer readable program instructionsmay also be stored in a computer readable storage medium that can directa computer, a programmable data processing apparatus, and/or otherdevices to function in a particular manner, such that the computerreadable storage medium having instructions stored therein comprises anarticle of manufacture including instructions which implement aspects ofthe function/act specified in the flowchart and/or block diagram blockor blocks.

The computer readable program instructions may also be loaded onto acomputer, other programmable data processing apparatus, or other deviceto cause a series of operational steps to be performed on the computer,other programmable apparatus or other device to produce a computerimplemented process, such that the instructions which execute on thecomputer, other programmable apparatus, or other device implement thefunctions/acts specified in the flowchart and/or block diagram block orblocks.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods, and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof instructions, which comprises one or more executable instructions forimplementing the specified logical function(s). In some alternativeimplementations, the functions noted in the block may occur out of theorder noted in the figures. For example, two blocks shown in successionmay, in fact, be executed substantially concurrently, or the blocks maysometimes be executed in the reverse order, depending upon thefunctionality involved. It will also be noted that each block of theblock diagrams and/or flowchart illustration, and combinations of blocksin the block diagrams and/or flowchart illustration, can be implementedby special purpose hardware-based systems that perform the specifiedfunctions or acts or carry out combinations of special purpose hardwareand computer instructions.

The terminology used herein is for the purpose of describing particularembodiments only and is not intended to be limiting of the disclosure.As used herein, the singular forms “a”, “an” and “the” are intended toinclude the plural forms as well, unless the context clearly indicatesotherwise. It will be further understood that the terms “comprises”and/or “comprising,” when used in this specification, specify thepresence of stated features, integers, steps, operations, elements,and/or components, but do not preclude the presence or addition of oneor more other features, integers, steps, operations, elements,components, and/or groups thereof

The corresponding structures, materials, acts, and equivalents of allmeans or step plus function elements in the claims below are intended toinclude any structure, material, or act for performing the function incombination with other claimed elements as specifically claimed. Thedescription of the present disclosure has been presented for purposes ofillustration and description, but is not intended to be exhaustive orlimited to the disclosure in the form disclosed. Many modifications andvariations will be apparent to those of ordinary skill in the artwithout departing from the scope and spirit of the disclosure. Theembodiment was chosen and described in order to best explain theprinciples of the disclosure and the practical application, and toenable others of ordinary skill in the art to understand the disclosurefor various embodiments with various modifications as are suited to theparticular use contemplated.

Having thus described the disclosure of the present application indetail and by reference to embodiments thereof, it will be apparent thatmodifications and variations are possible without departing from thescope of the disclosure defined in the appended claims.

1.-7. (canceled)
 8. A computer program product comprising anon-transitory computer readable medium having a plurality ofinstructions stored thereon, which, when executed by a processor, causethe processor to perform operations including: producing a test questionset model based upon, at least in part, calculated quality metrics of atest question set with respect to a test corpus, and including aplurality of test question set model features representing qualitymetrics for the test question set; comparing the test question set modelto a baseline question set model based on calculating a distance betweenone or more projected model features of the baseline question set modeland one or more runtime model features of the test question set model;adjusting contents of the test question set based upon, at least inpart, the calculated distance between the projected model features ofthe baseline question set model and the runtime model features of thetest question set model.
 9. The computer program product of claim 8,wherein the baseline question set model is produced based on calculatedquality metrics of a baseline question set with respect to a baselinecorpus and includes a plurality of baseline question set model featuresrepresenting quality metrics for the baseline question set.
 10. Thecomputer program product of claim 9, wherein the baseline question setmodel is selected based upon, at least in part, a domain distancebetween the baseline corpus and the test corpus.
 11. The computerprogram product of claim 8, wherein the calculated quality metrics forthe test question set model are calculated using a static question setanalysis tool.
 12. The computer program product of claim 8, furtherincluding instructions for: projecting the test question set accuracyfrom the runtime model features of the baseline question set byanalyzing the distance between the baseline question set model and thetest question set model.
 13. The computer program product of claim 8,further including instructions for: applying machine learning to tunethe test question set model by rewarding prominent features of the testquestion set and penalizing less prominent features of the test questionset.
 14. The computer program product of claim 8, further includinginstructions for: identifying a level of coverage for the test questionset; and identifying a level of non-coverage for the test question set.15. A system comprising: at least one processor device and at least onememory architecture coupled with the at least one processor device, theat least one processor device configured for: producing a test questionset model based upon, at least in part, calculated quality metrics of atest question set with respect to a test corpus, and including aplurality of test question set model features representing qualitymetrics for the test question set; comparing the test question set modelto a baseline question set model based on calculating a distance betweenone or more projected model features of the baseline question set modeland one or more runtime model features of the test question set model;adjusting contents of the test question set based upon, at least inpart, the calculated distance between the projected model features ofthe baseline question set model and the runtime model features of thetest question set model.
 16. The system of claim 15, wherein thebaseline question set model is produced based on calculated qualitymetrics of a baseline question set with respect to a baseline corpus andincludes a plurality of baseline question set model featuresrepresenting quality metrics for the baseline question set.
 17. Thesystem of claim 16, wherein the baseline question set model is selectedbased upon, at least in part, a domain distance between the baselinecorpus and the test corpus.
 18. The system of claim 15, wherein thecalculated quality metrics for the test question set model arecalculated using a static question set analysis tool.
 19. The system ofclaim 15, wherein the at least one processor device is furtherconfigured for: applying machine learning to tune the test question setmodel by rewarding prominent features of the test question set andpenalizing less prominent features of the test question set.
 20. Thesystem of claim 15, wherein the at least one processor device is furtherconfigured for: identifying a level of coverage for the test questionset; and identifying a level of non-coverage for the test question set.